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Abstract 

We show an optimal data-dependent hashing scheme for the approximate near neighbor 
problem. For an n-point dataset in a d-dimensional space our data structure achieves query 
time 0(d ■ n p+ °^) and space O(n 1+P+0 d) + d ■ n), where p = 2 J-i for the Euclidean space and 
approximation c > 1. For the Hamming space, we obtain an exponent of p = 2 c-i • 

Our result completes the direction set forth in [AINR14] who gave a proof-of-concept that 
data-dependent hashing can outperform classical Locality Sensitive Hashing (LSH). In contrast 
to [AINR14], the new bound is not only optimal, but in fact improves over the best (optimal) 
LSH data structures [IM98, AI06] for all approximation factors c > 1. 

From the technical perspective, we proceed by decomposing an arbitrary dataset into several 
subsets that are, in a certain sense, pseudo-random. 
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1 Introduction 

In the near neighbor search problem, we are given a set P of n points in a d-dimensional space, and 
the goal is to build a data structure that, given a query point q, reports any point within a given 
distance r to the query. The problem is of major importance in several areas, such as databases, 
data mining, information retrieval, computer vision, computational geometry, signal processing, etc. 

Efficient near(est) neighbor algorithms are known for the case when the dimension d is “low” 
(e.g., see [Cla88, Mei93]). However, the current solutions suffer from “the curse of dimensionality” 
phenomenon: either space or query time are exponential in the dimension d. To escape this curse, 
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researchers proposed approximation algorithms for the problem. In the (c, r)-approximate near 
neighbor problem (ANN), the data structure may return any data point whose distance from the 
query is at most cr, for an approximation factor c > 1 (provided that there exists a data point 
within distance r from the query). Many approximation algorithms for the problem are known: e.g., 
see surveys [AI08, And09]. 

To address the ANN problem, Indyk and Motwani proposed the Locality Sensitive Hashing 
scheme (LSH), which has since proved to be influential in theory and practice [IM98, HPIM12], In 
particular, LSH yields the best ANN data structures for the regime of sub-quadratic space and 
constant approximation factor, which turns out to be the most important regime from the practical 
perspective. The main idea is to hash the points such that the probability of collision is much 
higher for points which are close to each other (at distance < r) than for those which are far apart 
(at distance > cr). Given such hash functions, one can retrieve near neighbors by hashing the 
query point and retrieving elements stored in buckets containing that point. If the probability of 
collision is at least p± for the close points and at most p -2 for the far points, the algorithm solves the 
(c,r)-ANN using n 1+p extra space and dn p query time 1 , where p = log(l/pi)/log(l/p 2 ) [HPIM12], 
The value of the exponent p thus determines the “quality” of the LSH families used. 

A natural question emerged: what is the best possible exponent p? The original LSH paper 
[IM98] showed p < 1/c for both Hamming and Euclidean spaces. Focusing on the Euclidean space, 
subsequent research showed that one can obtain a better exponent: p < 1/c 2 [DIIM04, AI06] 2 . 
Complementing these results, lower bounds on p showed that this bound is tight [MNP07, OWZ11], 
thus settling the question: the best exponent is p = 1/c 2 for the Euclidean space. 

Data-dependent hashing. Surprisingly, while the best possible LSH exponent p has been 
settled, it turns out there exist more efficient ANN data structures, which step outside the LSH 
framework. In particular, [AINR14] obtain the exponent of p = ^ + ^ 3 ^ by considering data- 
dependent hashing , i.e., a randomized hash family that itself depends on the actual points in the 
dataset. We stress that this approach gives improvement for worst-case datasets, which is somewhat 
unexpected. To put this into a perspective: if one were to assume that the dataset has some special 
structure, it would be more natural to expect speed-ups with data-dependent hashing: such hashing 
may adapt to the special structure, perhaps implicitly, as was done in, say, [DF08, VKD09, AAKK14], 
However, in our setting there is no assumed structure to adapt to, and hence it is unclear why 
data-dependent hashing shall help. (To compare with classical, non-geometric hashing, the most 
similar situation where data-dependent hashing helps in the worst-case seems to be the perfect 
hashing [FKS84].) Note that for the case of Hamming space, [AINR14] has been the first and only 
improvement over [IM98] since the introduction of LSH (see Section 1.3). 

Thus the core question resurfaced: what is the best possible exponent for data-dependent hashing 
schemes? To formulate the question correctly, we also need to require that the hash family is “nice”: 

1 For exposition purposes, we are suppressing the time to compute hash functions, which we assume to require n°^ 
time and n °^ space. We also assume distances can be computed in 0(d) time, and that 1/pi = n°^\ 

ignoring terms vanishing with n; the exact dependence is p = 1/c 2 + l/log 011 - 1 n. 
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otherwise, the trivially best solution is the Voronoi diagram of n points at hand, which is obviously 
useless (computing the hash function is as hard as the original problem!). A natural way to preclude 
such non-viable solutions is to require that each hash function can be “efficiently described”, i.e., 
it has a description of bits (e.g., this property is satisfied by all the LSH functions we are 

aware of). 


1.1 Main result 


We present an optimal data-dependent hash family that achieves the following exponent 3 for ANN 
under the Euclidean distance: 


P 2c 2 - 1' 


(1) 


Specifically, we obtain the following main theorem. 


Theorem 1.1. For fixed approximation c > 1 and threshold r > 0, one can solve the ( c,r)-ANN 
problem in d-dimensional Euclidean space on n points with 0(d ■ n p+oly1 ^) query time, 0(n 1+p+ °^ + 
d ■ n ) space, and 0(d ■ n 1+p+ °W) preprocessing time, where p = 2c ^-i • 

The optimality of our bound (1) is established in a separate paper [AR15]. There we build on a 
data-independent lower bound for a random dataset [MNP07]. First, we improve upon [MNP07] 
quantitatively and obtain the lower bound p > ~ o(l) for the data-independent case, thus 

providing an alternative self-contained proof of the lower bound from [DublO]. Second, we argue 
that if there is a good data-dependent hashing family for a random dataset with hash functions 
having low description complexity, then it can be converted into a data-independent family. Hence, 
the lower bound p > 2c -I_ 1 — o( 1) applies to the data-dependent case as well. For the details, we 
refer readers to [AR15]. 

An important aspect of our algorithm is that it effectively reduces ANN on a generic dataset to 
ANN on an (essentially) random dataset. The latter is the most natural “hard distribution” for 
the ANN problem. Besides the aforementioned lower bounds, it is also a source of cell-probe lower 
bounds for ANN [PTW08, PTW10]. Hence, looking forward, to further improve the efficiency of 
ANN, one would have first to improve the random dataset case, which seems to require fundamentally 
different techniques, if possible at all. 

The importance of this reduction can be seen from the progress on the closest pair problem 
by Valiant [Vall2]. In particular, Valiant gives an algorithm with ro L62 • poly(^ i 7 j-) runtime for the 
aforementioned random instances . 4 Obtaining similar runtime for the worst case (e.g., via a similar 
reduction) would refute the Strong Exponential-Time Hypothesis (SETH). 

We also point out that—besides achieving the optimal bound—the new algorithm has two further 
advantages over the one from [AINR14], First, our bound (1) is better than the optimal LSH bound 
1/c 2 for every c > 1 (the bound from [AINR14] is only better for sufficiently large c). Second, the 

3 Again, we ignore the additive term that vanishes with n. 

4 Note that this improves over [DublO]: Valiant exploits fast matrix multiplication algorithms to step outside 
Dubiner’s hashing framework altogether. 
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preprocessing time of our algorithm is near-linear in the amount of space used, improving over the 
quadratic preprocessing time of [AINR14]. 

1.2 Techniques 

The general approach is via data-dependent LSH families, which can be equivalently seen as data- 
dependent random space partitions. Such space partitions are usually constructed iteratively: first 
we partition the space very coarsely, then we refine the partition iteratively a number of times. In 
standard LSH, each iteration of partitioning is random i.i.d., and the overall data structure consists 
of n p such iterative space partitions, constructed independently (see [HPIM12] for details). 

For the latter discussion, it is useful to keep in mind what are the random dataset instances 
for ANN. Consider a sphere of radius cr/y /2 in for d = 1000 log n. The data set is obtained by 
sampling n points on the sphere uniformly at random. To generate a query, one chooses a data 
point uniformly at random, and plants a query at distance at most within r from it uniformly at 
random. It is not hard to see that with very high probability, the query will be at least cr — o(l) 
apart from all the data points except one. Thus, any data structure for (c — o(l),r)-ANN must be 
able to recover the data point the query was planted to. 

Let us first contrast our approach to the previous algorithm of [AINR14], That result improved 
over LSH by identifying a “nice configuration” of a dataset, for which one can design a hash family 
with better p < 1/c 2 . It turns out that the right notion of niceness is the ability to enclose dataset 
into a ball of small radius, of order 0(cr ) (the aforementioned random instance corresponds to 
“tightest possible” radius of cr/y/ 2). Moreover, the smaller the enclosing ball is, the better the 
exponent p one can obtain. The iterative partitioning from [AINR14] consists of two rounds. During 
the first round, one partitions the space so that the dataset in each part would be “low-diameter” 
with high probability. This step uses classical, data-independent LSH, and hence effectively has 
quality p = 1/c 2 . During the second round, one would apply “low-diameter” LSH with quality 
p < 1/c 2 . The final exponent p is a weighted average of qualities of the two rounds. While one can 
generalize their approach to any number of rounds, the best exponent p one can obtain this way is 
around 0.73/c 2 + O(l/c 3 ) [Razl4], which falls short of (1). 

In fact, [AINR14] cannot obtain the optimal p as in (1) in principle. The fundamental issue is 
that, before one completes the reduction to a “nice configuration”, one must incur some “waste”. In 
particular, the first round uses (non-optimal) data-independent hashing, and hence the “average” p 
cannot meet the best-achievable p. (Moreover, even the second round of the algorithm does not 
achieve the optimal p.) 

Thus, the real challenge remained: how to perform each iteration of the partitioning with 
optimal pi E.g., we must succeed even during the very first iteration, on a dataset without any 
structure whatsoever. 

Our new algorithm resolves precisely this challenge. For simplicity, let us assume that all the 
data points lie on a sphere of radius R. (It is helpful to think of R as being, say, lOOOcr: e.g., the 
low-diameter family from [AINR14] gives almost no advantage over the data-independent p = 1 /c 2 
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for such R ). We start by decomposing the data set into a small number of dense clusters (by this 
we mean a spherical cap that is slightly smaller than a hemisphere and that covers n 1_ °^ points) 
and a pseudo-random remainder that has no dense parts. For dense clusters we recurse enclosing 
them in balls of radius slightly smaller than R, and for the pseudo-random part we just apply one 
iteration of the “low-diameter” LSH from [AINR14] and then recurse on each part. This partitioning 
subroutine makes progress in two ways. For dense clusters we slightly reduce the radius of the 
instance. Thus, after a bounded number of such reductions we will arrive to an instance that can be 
easily handled using the low-diameter family. For the pseudo-random remainder, we can argue that 
the low-diameter family works “unreasonably” well: intuitively, it follows from the fact that almost 
all pairs of data points are very far apart (roughly speaking, at distance almost \/2R). We call the 
remainder pseudo-random precisely because of the latter property: random points on a sphere of 
radius R are essentially (\/2-R)-separated. 

We note that one can see the partition from above as a (kind of) decision tree, albeit with a few 
particularities. Each node of the decision tree has a number of children that partition the dataset 
points (reaching this node), corresponding to the dense clusters as well as to all the (non-empty) 
parts of the aforementioned “low-diameter” LSH partition. In fact, it will be necessary for some 
points to be replicated in a few children (hence the decision tree size may be S> n), though we will 
show that the degree of replication is bounded (a factor of n°^ overall). The query procedure will 
search the decision tree by following a path from the root to the leaves. Again, it may be necessary 
to follow a few children at a decision node at a time; but the replication is bounded as well. The 
final data structure is composed of n p such decision trees, and the query algorithm queries each of 
them. 

A new aspect of our analysis is that, among other things, we will need to understand how the 
low-diameter family introduced in [AINR14] works on triples of points: without this analysis, one 
can only get a bound of ^ 

2 c 2 - 2 ’ 

which is much worse than (1) for small c. To the best of our knowledge, this is the first time when 
one needs to go beyond the “pairwise” analysis in the study of LSH. 

1.3 Further implications and connections 

Our algorithm also directly applies to the Hamming metric, for which it achieves 5 an exponent of 

1 


This is a nearly quadratic improvement over the original LSH paper [IM98], which obtained exponent 
p = 1/c, previously shown to be optimal for the classical LSH in [OWZ11]. The result of [AINR14] 
was the first to bypass the 1998 bound via data-dependent hashing, achieving p = 7 -^ L + , an 

5 This follows from a standard embedding of the t\ norm into fTsquared [LLR95]. 


5 



improvement for large enough c. Our new bound improves over [IM98] for all values of c, and is 
also optimal (the above discussion for I 2 applies here as well). 

From a broader perspective, we would like to point out the related developments in practice. 
Many or most of the practical applications of LSH involve designing data-aware hash functions [Spr91, 
McNOl, VKD09, WTF08, SH09, YSRL11] (see also a recent survey [WSSJ14]). The challenge of 
understanding and exploiting the relative strengths of data-oblivious versus data-aware methods 
has been recognized as a major open question in the area (e.g., see [Coul3], page 77). This paper 
can be seen as part of the efforts addressing the challenge. 

Let us also point a recent paper [AIK + 15] that provides a practical analog of Spherical LSH 
(see Section 3) that not only has the same theoretical guarantees, but is also practical; in particular, 
it outperforms in practice celebrated Hyperplane LSH [Cha02] for similarity search on a sphere. 

Finally, we note that our inherently static data structure can be dynamized: we can perform 
insertions/deletions in time d ■ n 2c ' 2 ~ 1+ ° ^ using a well-known dynamization technique for decom¬ 
posable search problems [OvL81]. Here we crucially use the fast preprocessing routine developed in 
the present paper. 

2 Preliminaries 

In the text we denote the £2 norm by || • ||. From now on, when we use O(-), o(-), H(-) or cj(-) we 
explicitly write all the parameters that the corresponding constant factors depend on as subscripts 
(the variable is always n or derived functions of n). Our main tool will be random partitions of 
a metric space. For a partition 1Z and a point p we denote 7Z{p) the part of 1Z, which p belongs 
to. If 1Z is a partition of a subset of the space, then we denote (J7 Z the union of all the pieces 
of 7 Z. By N(a, a 2 ) we denote a standard Gaussian with mean a and variance a 2 . We denote the 
closed Euclidean ball with a center u and a radius r > 0 by B(u,r). By dB(u,r) we denote the 
corresponding sphere. We denote S d ^ 1 C the unit Euclidean sphere in with the center being 
the origin. A spherical cap can be considered as a ball on a sphere with metric inherited from 
the ambient Euclidean distance. We define a radius of a spherical cap to be the radius of the 
corresponding ball. For instance, the radius of a hemisphere of a unit sphere is equal to \/2. 

Definition 2.1. The (c,r)-Approximate Near Neighbor problem (ANN) with failure probability / 
is to construct a data structure over a set of points P C supporting the following query: given 
any fixed query point q G R d , if there exists pgP with \\p — q\\ < r, then report some p' G P such 
that \\p' — </|| < cr, with probability at least 1 — /. 

Note that we allow preprocessing to be randomized as well, and we measure the probability of 
success over the random coins tossed during both preprocessing and query phases. 

Definition 2.2 ([HPIM12]). We call a random partition 7 Z of (ri,r2,pi,P2)-sensitive, if for 
every x,y G X we have Pr^ \TZ{x) = TZ(y)\ > p\ if ||x — y\\ < ri, and Pr^ \TZ{x) = TZ(y)\ < P 2 if 
\\x - y\\ > r 2 . 
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Remark: For 1Z to be useful we require that r\ < r 2 and p\ > P 2 - 

Now we are ready to state a very general way to solve ANN, if we have a good (r,cr,p\,p 2 )- 
sensitive partition [IM98, HPIM12]. The following theorem gives a data structure with near-linear 
space and small query time, but with probability of success being only inversely polynomial in the 
number of points. 

Theorem 2.3 ([IM98, HPIM12]). Suppose that there is a (r,cr,p\,p 2 )-sensitive partition 1Z ofM. d , 
where (pi,P 2 ) £ (0,1) and let p = ln(l/pi)/ln(l/p 2 )- Assume that p\,p 2 > 1 /n° c ^\ one can sample 
a partition from 1Z in time n° c( A\ store it in space n° c ^ and perform point location in time n° c ^. 
Then there exists a data structure for (c, r)-ANN over a set P C with |P| = n with preprocessing 
time 0{dn 1+0c ^), probability of success at least n~ p ~° c ^\ space consumption (in addition to the 
data points) 0(n l+ °A 1 )) and expected query time 0(dn 0c Al). 

Remark 2.4. To obtain the final data structure we increase the probability of success from n -p-°c(i) 
to 0.99 by building 0{n p+ ° c ^) independent data structures from Theorem 2.3. Overall, we obtain 
0{dn p+0c W) query time, 0(dn + n 1+p+ ° c Al) space, and 0(dn 1+p+0c preprocessing time. 

In our algorithm, we will also be using the following (data-independent) LSH scheme. 

Theorem 2.5 ([DIIM04]). There exists a random partition 7Z of such that for every u, v € 
with 11it — v\\ = r one has In Pr ^ = (l + O r (l/^)) ■ r\fd. Moreover, 7Z can be sampled in 
time dP^ l \ stored in space d°A) and a point can be located in 1Z in time d°A), 

3 Spherical LSH 

In this section, we describe a partitioning scheme of the unit sphere S d ~ l , termed Spherical LSH. We 
will use Spherical LSH in our data structure described in the next section. While the Spherical LSH 
was introduced in [AINR14], we need to show a new important property of it. We then illustrate 
how Spherical LSH achieves optimal p for the ANN problem in the “base case” of random instances. 
As mentioned in the Introduction, the main thrust of the new data structure will be to reduce a 
worst-case dataset to this “base case”. Let us point out that a partitioning procedure similar to 
Spherical LSH has been used in [KMS98] for completely different purpose. 

The main idea of the Spherical LSH is to “carve” spherical caps of radius y/2 — o(l) (almost 
hemispheres). The partitioning proceeds as follows: 

0 

while 5 d “ 1 do 

sample g r\j N (0, l) d 
U <- {«£ S^ 1 : (u, g) > d 1 / 4 } \ U K 

if U 0 then 

n^nu{u} 
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Here 1Z denotes the resulting partition of S rf_1 , \J7 Z denotes the union of all elements of 1Z (the 
currently partitioned subset of and N( 0, l) rf is a standard d-dimensional Gaussian. 

This partitioning scheme is not efficient: we can not quickly compute (J 7Z, and the partitioning 
process can potentially be infinite. We will address these issues in Section A.3. 

Now let us state the properties of Spherical LSH (the efficiency aspects are for a slightly modified 
variant of it, according to Section A.3). The proof is deferred to Appendix A. 


Theorem 3.1. There exists a positive constant 8 > 0 such that for sufficiently large d, Spherical 
LSH 7 Z on S d ~ 1 has the following properties. Suppose that e = s(d) > 0 tends to 0 as d tends to 
infinity; also let r £ [d~ s ; 2 — d _<5 ]. Then, for every u,v,w £ we have 


u — v\\ > t implies In 


u — u|| < t implies In 


1 

Pr* mu) = lZ(v)} 
1 

Prrc \K{u) = TZ(v)} 


> (1 -d" n W) • 
< (1 + d" n W) • 


T 2 y/d 

4 — r 2 ‘ 2 

t 2 \fd 

4 — r 2 ‘ 2 


( 2 ) 

(3) 


|u — it;||, ||u — w || £ \/2 ± e and ||u — u|| < 1.99 

1 


imply In 


Pr w [JZ(u) = 1Z(iu) | K(u) = TZ{v)} 


> (1 — e n (i) - d -n (i)) • —. (4) 


Moreover, we can sample a partition in time exp (0(V~d)), store it in space exp (0(y/d)) and locate a 
point from S d ~ l in it in time exp (0(y/d)). 


Discussion of the three-point collision property (4). While properties (2) and (3) were 
derived in [AINR14] (under an additional assumption r < \/2), the property (4) is the new 
contribution of Theorem 3.1. 

Let us elaborate why proving (4) is challenging. First, one can easily show that 


In 


1 

Prrc [K{u) = IZfiv) | n(u) = TZ(v)] 


Prg [TZ(u) = 1Z{v)} 

~ Pun \TZ{u) = 1Z(w)} 

= ln Pr^ [K(u) = lZ(w)} ~ ln Pr^ [K(u) = 1Z(v)} 

Vd 1 

~ ~2 ~ n Pi-tj [lZ(u) = lZ{v)} ’ 


(5) 


where the last step follows from (2), (3) and the fact that ||u — tc|| ~ y/2. However this is worse than 
Vd /2 claimed in (4) provided that u and v are not too close. It turns out that (5) is tight, if we do 
not assume anything about \\v — u>|| (for instance, when u, v and w lie on a great circle). So, we have 
to “open the black box” and derive (4) from the first principles. A high-level idea of the analysis 
is to observe that, when ||u — w\\ ~ V2, certain directions of interest become almost orthogonal, 
so the corresponding Gaussians are almost independent, which gives almost independence of the 



events u lZ(u) = lZ(w) n and “7 Z(u) = TZ(v)”, which, in turn, implies 


In 


1 

Prrc | K(u) = K{w) | U(u) = U(v)\ 


In 


1 

Pr^ [R(u) = K{w)] 


Vd 

~T 


as required. Again, see the full argument in Appendix A. 


3.1 Implications for ANN 

It is illuminating to see what Spherical LSH implies for a random instance (as defined in the 
Introduction). Since all the points lie on a sphere of radius cr/\/2, we can plug in Theorem 3.1 into 
Theorem 2.3, and thus obtain the exponent 


P < 


1 

2 c 2 - 1 


+ o c ( 1 ). 


Note that we achieve the desired bound (1) for random instances by using the Spherical LSH directly. 


4 The data structure 

In this section we describe the new data structure. First, we show how to achieve success probability 
n~ p , query time n° c ^\ and space and preprocessing time n 1+ ° c( d\ where p = 1 + o c ( 1). Finally, 

to obtain the final result, one then builds 0{n p ) copies of the above data structure to amplify 
the probability of success to 0.99 (as explained in Remark 2.4). We analyze the data structure in 
Section 5. 

4.1 Overview 

We start with a high-level overview. Consider a dataset Po of n points. We can assume that 
r = 1 by rescaling. We may also assume that the dataset lies in the Euclidean space of dimension 
d = 0(log n Tog log n): one can always reduce the dimension to d by applying Johnson-Lindenstrauss 
lemma [JL84, DG03] while incurring distortion at most 1 + l/^oglogn)^ 1 ) with high probability. 

For simplicity, suppose that the entire dataset Po and a query lie on a sphere dB( 0, R) of radius 
R = O c ( 1). If 12 < c/a/2, we are done: this case corresponds to the “nicest configuration” of points 
and we can apply Theorem 2.3 equipped with Theorem 3.1 (see the discussion in Section 3.1). 

Now suppose that R > c/y/2. We split Po into a number of disjoint components: l dense 
components, termed Ci, 62 , ..., C;, and one pseudo-random component, termed P. The properties 
of these components are as follows. For each dense component Ci we require that \Ci\ > rn and 
that Ci can be covered by a spherical cap of radius (y/2 — e)R. Here r, e > 0 are small positive 
quantities to be chosen later. The pseudo-random component P is such that it contains no more 
dense components inside. 
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Figure 1: Covering a spherical cap of radius (y/2 — e)R 

We proceed separately for each C\ and P as follows. For every dense component C{. we enclose it 
in a ball Ei of radius (1 — 0(e 2 ))i? (see Figure 1). For simplicity, let us first ignore the issue that C* 
does not necessarily lie on the boundary dE{. Then, we can just recurse for the resulting spherical 
instance with radius (1 — Q(s 2 ))R. We treat the pseudo-random part P completely differently. We 
sample a partition (hash function) 7 Z of dB( 0, R) using Theorem 3.1. Then we partition P using 7 Z 
and recurse on each non-empty part. Note that after we have partitioned P, there may appear new 
dense clusters in some parts (since it may become easier to satisfy the minimum size constraint). 

During the query procedure, we will recursively query each Ci. Furthermore, for the pseudo¬ 
random component P. we locate the part of 1Z that captures the query point, and recursively query 
this part. Overall, there are (l + 1) recursive calls. 

To analyze our algorithm, we show that we make progress in two ways. First, for dense clusters 
we reduce the radius of a sphere by a factor of (1 — 0(e 2 )). Hence, in O c (l/e 2 ) iterations we must 
arrive to the case of R < c/y/ 2, which is easy (as argued above). Second, for the pseudo-random 
component P, we argue that most of the points lie at distance (y/2 — e)R from each other. In 
particular, the ratio of R to a typical inter-point distance is ~ l/y/2, like in a random case for which 
Spherical LSH from Theorem 3.1 is efficient, as discussed in Section 3.1. (This is exactly the reason 
why we call P pseudo-random.) Despite the simplicity of this intuition, the actual analysis is quite 
involved: in particular, this is the place, where we use the three-point property (4) of the Spherical 
LSH. 

It remains to address the issue deferred in the above high-level description: namely, that a dense 
component Ci does not generally lie on dEi, but rather can occupy the interior of E % . We deal with 
it by partitioning Ei into very thin annuli of carefully chosen width S. We then treat each annulus 
as a sphere. This discretization of a ball adds to the complexity of the analysis, although it does 
not seem to be fundamental from the conceptual point of view. 

Finally, we also show how to obtain fast preprocessing, which turns out to be a non-trivial task, 
as we discuss in Section 6. The main bottleneck is in finding dense components, for which we show 
a near-linear time algorithm. Roughly, the idea is to restrict ourselves to dense components with 
centers in data points: this gives preprocessing time n 2+0c ^ 1 \ we improve it further, to n 1 +° c ( 1 ), by 
sampling the dataset and searching for dense components in the sample only (intuitively, this works 
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Project^!, R 2 , r) 



Figure 2: The definition of Project 
because we require the dense components to contain many points). 

4.2 Formal description 

We are now ready to describe the data structure formally. It depends on the (small positive) 
parameters r, e and <5, which we will need to choose carefully later on. The pseudocode appears as 
Figure 3. 

Preprocessing. Our preprocessing algorithm consists of the following functions: 

• ProcessSphere(P, n, r 2 , o, R) builds the data structure for a pointset P that lies on a 
sphere dB(o , R ), assuming we need to solve ANN with distance thresholds r\ and r 2 . Moreover, 
we are guaranteed that queries will lie on dB(o , R), too. 

• ProcessBall(P, n, r 2 , o , R) builds the data structure for a dataset P that lies inside the 
ball B(o,R), assuming we need to solve ANN with distance thresholds n and r 2 . Unlike 
ProcessSphere, here queries can be arbitrary. 

• Process(P) builds the data structure for a dataset P to solve the general (l,c)-ANN; 

• Project(Pi, R 2 , r ) is an auxiliary function computing the following projection. Suppose 
we have two spheres S± and S 2 with a common center and radii Pi and R 2 . Suppose there 
are points p\ £ S i and p 2 £ S 2 with ||pi — p 2 \\ = r. Project(Pi, R 2 , r) returns the distance 
between p\ and the point p^ that lies on Si and is the closest to p 2 (see Figure 2). 

We now elaborate on algorithms in each of the above functions. 

ProcessSphere. Function ProcessSphere follows the exposition from Section 4.1. First, we 
consider three base cases. If r 2 > 2 R, then the goal can be achieved trivially, since any point from 
P works as an answer for any valid query. If r\/r 2 < 1/(2c 2 — 1), then Theorem 2.3 coupled with 
the hash family from Theorem 2.5 does the job. Similarly, if r 2 > \/2 R, then we can use the family 
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from Theorem 3.1 (see the discussion in Section 3.1). Otherwise, we find non-trivially smaller balls 
(of radius (y/2 — e)R) with centers on dB{o , R ) that contain many data points (at least r|P|). These 
balls can be enclosed into balls (with unconstrained center) of radius R < (1 — f \e 2 ))R (proven in 
Claim 5.1). For these balls we invoke ProcessBall. Then, for the remaining points we sample a 
partition of dB(o , R) using Theorem 3.1, and recurse on each part. We note that in order to apply 
Theorem 2.5 and Theorem 3.1 we need certain conditions on n, r2 and R to hold (we define and 
verify them in Claim 5.2). 

ProcessBall. First, we consider the following simple base case. If r\ + 2 R < ?’ 2 , then any point 
from B(o, R) could serve as a valid answer to any query. 

In general, we reduce to the spherical case via a discretization of the ball B(o,R). First, we 
round all the distances to o up to a multiple of 5, which can change distance between any pair of 
points by at most 25 (by the triangle inequality). Then, for every possible distance 5i from o to 
a data point and every possible distance 5j from o to a query (for admissible integers i,j), we build 
a separate data structure via ProcessSphere (we also need to check that |<5(i — j)| < rq + 25 to 
ensure that the corresponding pair (i,j) does not yield a trivial instance). We compute the new 
distance thresholds r\ and r 2 for this data structure as follows. After rounding, the new thresholds 
for the ball instance should be r\ + 25 and r 2 — 25, since distances can change by at most 25. To 
compute the final thresholds (after projecting the query to the sphere of radius 5i), we just invoke 
Project (see the definition above). 

Process. Process reduces the general case to the ball case. We proceed similarly to Process- 
Sphere, with a three modifications. First, instead of the family from Theorem 3.1, we use the 
family from Theorem 2.5 which is designed for partitioning the whole rather than just a sphere. 
Second, we seek to find clusters of radius 2c 2 . Third, we do not need to find the smallest enclosing 
ball for P n B(x, 2c 2 ): instead, B(x, 2c 2 ) itself is enough. 

Project. This is implemented by a formula (see Figure 2). 

Overall, the preprocessing creates a decision tree, where the nodes correspond to procedures 
ProcessSphere, ProcessBall, Process. We refer to the tree nodes correspondingly, using the 
labels in the below description of the query algorithm. 

Observe that currently the preprocessing is expensive: a priori it is not even clear how to make 
it polynomial in n as we need to search over all possible ball centers o. We address this challenge in 
Section 6. 

Query procedure. Consider a query point q G M d . We run the query on the decision tree, 
starting with the root, and applying the following algorithms depending on the label of the nodes: 

• In Process we first recursively query the ball data structures. Second, we locate q in 7 Z, and 
query the data structure we built for P n 7 Z(q). 
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function Process(P) 
m <s— |P| 

while 3 x € R d : | B(x, 2c 2 ) fl P\ > rm do 
ProcessBall(P n B(x, 2c 2 ), 1, c, x, 2c 2 ) 
P ^ P\B(x,2c 2 ) 
sample TZ according to Theorem 2.5 

for U £ TZ do 

if P n U / 0 then 
PROCESS(PnP) 

function ProcessBall(P, n, r 2 , o, P) 
if ri + 2P < r 2 then 

store any point from P 

return 

p H»+«r¥i’iRipEP } 

for i ■<— 0 ... [y] do 

P <— {p € P: ||p — o|| = <5i} 
if P 7 ^ 0 then 

for j <— 0 ... f T+li+M"| do 
if 5|i — j | < ri + 25 then 

n <— Project(5*, Sj, r 1 + 25) 
r2 <— Project(5*, Sj, r2 — 25) 
ProcessSphere(P, fi, fB, o, Si) 


function Project(Pi, P 2 , r) 

return V /P i (r 2 — (Pi — P 2 ) 2 )/P 2 

function ProcessSpiiere(P, n, r 2 , o, P) 
if r 2 > 2P then 

store any point from P 

return 

if — < v, - o 1 then 

apply Theorem 2.3 with Theorem 2.5 to P 

return 

if r 2 > \/2P then 

apply Theorem 2.3 with Theorem 3.1 to P 
return 
m |P| 

P^ {s/2 -e)R 

while 3 * £ dB(o, P) : | B(x, P) (~l P| > rm do 
P(o, P) the SEB for P C\ B(x, R) 
ProcessBall(P n B(x, P), n, r 2 , o, P) 

P <- P\B(x,R) 

sample TZ according to Theorem 3.1 
for U G 1Z do 

if P Cl U / 0 then 

ProcessSphere(P n U, n, r 2 , o, P) 


Figure 3: Pseudocode of the data structure (seb stands for smallest enclosing ball ) 


• In ProcessBall, we first consider the base case, where we just return the stored point if it is 
close enough. In general, we check if || q — o|| < R + r \. If not, we can return. Otherwise, we 
round q so that the distance from o to q is a multiple of 5 . Next, we enumerate the distances 
from o to the potential near neighbor we are looking for, and query the corresponding 
ProcessSphere children after projecting q on the sphere with a tentative near neighbor 
(using, naturally, Project). 

• In ProcessSphere, we proceed exactly the same way as Process modulo the base cases, 
which we handle according to Theorem 2.3. 


5 Analysis of the data structure 

In this section we analyze the above data structure. 

5.1 Overview 

The most interesting part of the analysis is lower bounding the probability of success: we need 
to show that it is at least n ~ p ~° c ^ l \ where p = 2 2 1 _ 1 . The challenge is that we need to analyze a 
(somewhat adaptive) random process. In particular, we cannot just use probability of collision of far 
points as is usually done in the analysis of (data-independent) LSH families. Instead, we use its 
empirical estimate : namely, the expected number of data points remaining in the part containing the 
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query q. While this allows to make some good-quality progress, this only lasts for a few iterations 
of partitioning, until we run into the fact that the set is only pseudo -random and the deviations 
from the “ideal structure” begin showing up more prominently (which is the reason that, after 
partitioning, we again need to check for densely populated balls). Furthermore, while computing 
this empirical estimate, we need to condition on the fact that the near neighbor is colliding with the 
query. 

A bit more formally, the proof proceeds in two steps. First, we show that whenever we apply 
a partition 1Z to a pseudo-random remainder P, the quality we achieve is great: the exponent 
we get is In( 1 /j> 1)/In( 1 / P2 ) < 2 c 2 _ 1 + °c( 1 ) ( see Claim 5.5). Here p\ = Pr^ [1Z(p) = 7Z(q)] is the 
probability for the query g£l (i and its near neighbor p £ P to collide under P, and 


P2 = 


'| lZ(p) n P 


K{p) = K{q) 


is the (conditioned) empirical estimate of the fraction of P that collides with p. Note that, 
when computing p 2 , we condition on the fact that the query and its near neighbor collide (i.e., 
1Z(p) = 1Z{q)). It is exactly this conditioning that requires the three-point property (4) of the 
Spherical LSH. Furthermore, we use the fact that all the “dense” balls have been carved out, in 
order to argue that, on average, many points are far away and so p 2 is essentially governed by 
the collision probability of the Spherical LSH for distances around y/2 R. In the second step, we 
proceed by lower bounding the probability of success via a careful inductive proof analyzing the 
corresponding random process (Claim 5.6). Along the way, we use the above estimate crucially. See 
Section 5.4 for details. 

The rest of the analysis proves that the data structure occupies n 1+ ° c ^ space and has n° c C) 
query time (in expectation). While a bit tedious, this is relatively straightforward. See Section 5.5 
for details. Finally, we highlight that obtaining the near-linear preprocessing algorithm requires 
further ideas and in particular utilizes the van der Corput lemma. See Section 6. 


5.2 Setting parameters 

Recall that the dimension is d = ©(logn • log log n). We set s, 5, r as follows: 

. £= _i_• 

log log log n ’ 

• 5 = exp(—(log log logn) c ); 

• r = exp(— log 2,/3 n ), 

where C is a sufficiently large positive constant (the concrete value of C is only important for the 
proof of Claim 5.2). 
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5.3 Invariants and auxiliary properties 

We now state and prove several invariants and properties of the data structure that are needed for 
the subsequent analysis. 

Claim 5.1. In ProcessSphere we have R < ( 1 — fl(e 2 ))l? (see Figure 1). 

Proof. It is enough to show that for x G dB(o,R) there is a ball of radius (1 — I}(e 2 ))R that 
covers dB(o,R ) fl B(x, (\/2 — e)R). Without loss of generality we can assume that o = 0 and 
x = ( R , 0,..., 0). Then, dB( 0, R) fl B(x, (y/2 — e)R) ={«£ dB{ 0, R) : u\ > r)R}, where r) = 0(e). 
At the same time, we can cover {u G dB( 0, R): u\ > i]R} with the ball centered in (i]R, 0,..., 0) 
and radius Ry/l — rf 2 = (1 — H(rj 2 ))R = (1 — Q(e 2 ))R. □ 

Lemma 5.2. The following invariants hold. 

• At every moment of time we have ^ > c — o c (l), r 2 < c + o c (l) and R < O c ( 1); 

• After checking for the base case in ProcessBall and the first base case in ProcessSphere 
we have r^lR > exp(—O c (logloglogn)°( 1 ^); 

• At every moment of the preprocessing or the query procedures, the number of calls to Pro¬ 
cessBall in the recursion stack is O c (e~°^); 

• The expected length of any contiguous run of calls to Process or ProcessSphere in the 
recursion stack is O(logn) (again, during the preprocessing or the query). 

The rest of the Section is devoted to proving Lemma 5.2. The proofs are quite technical, and 
can be omitted on first reading. 

Consider the recursion stack at any moment of the preprocessing or the query algorithms. It has 
several calls to ProcessBall interleaved with sequences of calls to Process and ProcessSphere. 
Our current goal is to bound the number of calls to ProcessBall that can appear in the recursion 
stack at any given moment (we want to bound it by O c (l/e 6 )). First, we prove that this bound 
holds under the (unrealistic) assumption that in ProcessBall the rounding of distances has no 
effect (that is, all the distances of points to o are already multiples of 6). Second, we prove the 
bound in full generality by showing that this rounding introduces only a tiny multiplicative error to 
r i, r 2 and R. 

Claim 5.3. Suppose that the rounding of distances in ProcessBall has no effect (i.e., distances 
from o to all points are multiples of 5). Then the number of the calls to ProcessBall in the 
recursion stack at any given moment of time is O c ( 1/e 6 ). 

Proof. Let us keep track of two quantities: r/ = r 2 /r 2 and £ = r 2 /R 2 . It is immediate to see that 
the initial value of p is c 2 , it is non-decreasing (it can only change, when we apply Project, which 
can only increase the ratio between r 2 and ri), and it is at most (2c 2 — l) 2 (otherwise, the base case 
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in ProcessSphere is triggered). Similarly, £ is initially equal to l/(4c 2 ) and it can be at most 2 
(otherwise, the base in ProcessSphere is triggered). Unlike rj. the value of £ can decrease. 

Suppose that in ProcessBall we call ProcessSphere for some R\ = Si and R 2 = Sj with 
l-Ri — -R 2 I = Ai?. Suppose that rj is the new value of // after this call. We have 

rj _ r 2 /f\ 2 _ PROJECT(i?i, R- 2 , r 2 ) 2 /PROJECT(Pi, R 2 , ?’i) 2 _ (r 2 - Ai? 2 )/(r 2 - A R 2 ) 

V rj/rj 


where the third step follows from the formula for Project and the last step follows from the fact 
that f? > c 2 > 1. 

Let us call an invocation of ProcessSphere within ProcessBall A -shrinking for some A > 0, 
if AR 2 /r 2 > A. From ( 6 ), the fact that rj E [c 2 ; (2c 2 — l) 2 ] and that rj is non-decreasing, we conclude 
that there can be at most O c (l/A) A-shrinking invocations of ProcessSphere in the recursive 
stack at any given moment of time. 

Now let us see how £ evolves. Suppose that within some ProcessBall we call ProcessSphere 

with some R\ and R 2 - Then, in ProcessSphere we call ProcessSphere recursively several times 

_ ~ ~2 

without any change in n, r 2 and R, and finally we call ProcessBall again. Denote £ = S- the 

R 2 

new value of £ after this call of ProcessBall. We have 



z 

s 


r 2 2 /R 2 
rl/R 2 


>(1 + U ( £ 2 )) 


= (1 + D(e 2 )) (l - > (1 + D(e 2 )) (l - 


A R 2 \ 1 


(7) 


where the second step follows from Claim 5.1, the third step follows from the formula for Project, 
the fifth step follows from the fact that R\ < R and Ra < R\ + A R < R + A R. 

Denote A* = £ 4 /C* for a sufficiently large constant C > 0. If the call to ProcessSphere within 
ProcessBall is not A*-shrinking (that is, Ai?/r 2 < e 2 /VC), then since 


AR AR r- y/2 ■ AR f2f" 2 

-5- = — ' V'f < --< V r ' £ ’ 

it r 2 r 2 V 6 

where we use £ < 2, from (7) we have that £/£ = 1 + D(e 2 ) (provided that C is sufficiently large). 
On the other hand, if the call is A*-shrinking, then since 

AR 2 r‘i 1 

- < — < — 

r 2 - 2 - 2 ’ 

'2 '2 c 


Ai? < n_ _ n r2 < y/2 
R ~ R r 2 R ~ c ' 
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we have from (7) 


1 




l + Vi/c 


0 / 1 ), 


since c > 1. That being said, non-A*-shrinking calls increase £ non-trivially (by at least (1 + fl(e 2 ))), 
while A*-shrinking calls decrease £ by not too much, and by the above discussion, there are at most 
0 C (1/A*) = O c (l/e 4 ) of them. 

More formally, suppose we have A calls to ProcessSphere that are A*-shrinking and B calls 
that are not A*-shrinking. We have that A = O c (l/X*) = O c (l/e 4 ). On the other hand, since every 
A*-shrinking call multiplies £ by at least 0(1), the initial value of £ is 1 /(4c 2 ), and the final value 
of £ is at most 2, we have 

(1 + tt{£ 2 )) B < exp(O c (A)), 


thus, B = O c (l/e 6 ). Overall, we have at most A + B < O c (l/e 6 ) invocations of ProcessBall in 
the recursion stack at any moment of time. □ 


Claim 5.4. Suppose that the rounding of distances in ProcessBall has no effect (i.e., dis¬ 
tances from o to all points are multiples of 5). At any moment of time, in ProcessBall and 
ProcessSphere outside of the base cases one has r\,r 2 ,R > exp(—O c (l/e 10 )). 

Proof. By the proof of Claim 5.3, r' 2 /r'i is O c (l) and r 2 /R £ [exp(— O c (l/e 4 )); a/2]. 

After calling ProcessSphere within ProcessBall (and vice versa) the new value of R is at 
least fl( r i), since otherwise the first base case in ProcessSphere (or the base case of ProcessBall) 
will be triggered. 

So, overall we have O c ( 1/e 6 ) calls to ProcessBall in the recursion stack each of which can 
decrease ri,r 2 ,i? by a factor of at most exp(O c (l/e 4 )). Hence the claim. □ 

Since e = 1/log log log n, we can choose C in the definition of 5 (see Section 5.2) so that, by 
Claim 5.4, rounding distance to multiplies of 6 gives only a small multiplicative change to r\,r 2 ,R 
that accumulates to 1 + o c (l) over O c (l/e 6 ) iterations. 

This way, we obtain all the items in Claim 5.2, except the last one (by staring at the above 
proofs and taking the previous paragraph into account). 

Let us show the last item for Process (for ProcessSphere the proof is the same verbatim). 
Let us look at any data point p £ -Po- Suppose that p ends up in a pseudo-random remainder. Then, 
there are only rn points in the 2c 2 -neighborhood of p. When we sample 1Z, the number of points 
outside this neighborhood multiplies by a constant stricly smaller than one (in expectation). Thus, 
overall, the number of points in P multiplies by a constant smaller than one on average every call 
to Process. It means that in O(logn) calls with high probability either p will be captured by a 
dense cluster or it will be the only remaining point. 
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Figure 4: For the proof of Claim 5.5: distances to points in U 

5.4 Probability of success 

We now lower bound the probability of success for a query q E for which there exists 
p E -Po with ||g — p || < 1. We perform this analysis in two steps. First, we upper bound 
[|P fl TZ(q)\ | 7 Z(p) = 1Z(q)] in Process and ProcessSphere provided that p E P after re¬ 
moving dense clusters. This upper bound formalizes the intuition that after removing dense clusters 
the remaining pseudo-random instance becomes easy to partition using Spherical LSH. While upper 
bounding this expectation for ProcessSphere, we crucially rely on the estimate (4) for triples of 
points from Theorem 3.1 (see also the remark after the proof of Claim 5.5). Second, we use this 
estimate to lower bound the overall probability of success by analyzing the corresponding random 
process. 

Claim 5.5. Suppose that in Process or ProcessSphere, we have p E P after removing all dense 
balls. Let q E be such that ||p — <?|| < 1 for Process or || p — q\\ < n and q E dB{o,R) for 
ProcessSphere. Then, after sampling 1Z, we have 


ln (lM) < 1 +Q(1) 

ln(l/p 2 ) 2c 2 — 1 c ’ 


( 8 ) 


where p\ = Pr^ \TZ(p) = P{q)\, and p 2 = E^ \ Ll{p) = 7 Z(q) , where m is the size of P at 

the beginning of Process or ProcessSphere. 

Proof. Let us first analyze Process, for which the analysis is somewhat simpler. In this case, by 
Theorem 2.5, we have 


ln(l/pi) = In 


1 


Pr^ [U{p) = K(q)\ 


= (1 + o(l))Vd. 


(9) 
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On the other hand. 


E n [\PnK(q)\ \K(p)=K{q)\ <r | E n [\(PnK(q))\B(q,2c 2 )\\K(p)=K(q)] 

m ~ m 

<t+ sup Pin \jl{jp') = R{q) | TZ(p) = U(q)\ 
p'eP\B(q,2c 2 ) 

^ sup y eP \ B ( q , 2 c 2 ) Pr w [Jlip 1 ) = n(q)} 

- T Pr^ [nip) = n{q)} 

< T _|_ g-(2c 2 -l)v / rf-( 1 +°c(l)) 

< e -(2c 2 -l)Vd-(l+Oc(l)) j 


where the first step follows from the fact that after removing the dense clusters we have \1Z D 
B{q, 2c 2 ) | < Tin , the second step follows from the linearity of expectation, the fourth step follows 
from Theorem 2.5 and the last step uses that d = ©(lognloglogn) and r = exp(—log 2//3 n). Now, 
combining (9) and (10), we get 


Mj7gi) 
ln(l M) 


< 


1 

2 c 2 - 1 


+ Oc(l) 


as desired. 

Now let us analyze ProcessSphere. By Claim 5.2 and the fact that r^/ri < O c (l) (oth¬ 
erwise, we would have triggered the second base case in ProcessSphere), we have r\/R > 
exp(—O c (logloglogn)°(P), and d = O(lognloglogn), so we are in position to apply Theorem 3.1. 
We have 


ln(l/pi) < 


< 


jri/R) 2 
4 -in/R) 2 ' 

1 Vd 

2 c 2 - 1 ' 2 


f '(1 + 0 ( 1 )) 

• (1 + o c (l)), 


( 11 ) 


where the second step follows from the estimate 


ri 

R 


— ■ (1 < — ■ (1 +O c (l)), 
? 2 R c 


where the second step is due to the assumptions r^/Vi > c — o c (l) (which is true by Claim 5.2) 
and ?’2 < y/2R (if the latter was not the case, we would have triggered the third base case in 
ProcessSphere). Let p and q be reflections of p and q respectively with respect to o. Define 


U = B{q, (V2 - e)R) U B(q, {V2 - e)R) 
U B(p, {V2 — s)R) U B{p , (y/2 — e)R). 
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Then 


En[\Pnn{ q )\\K(p) = K{q)] ^ iT | E n l\(PnK(q))\U\\K(p) = K(g)} 

m ~ m 

<4r + sup Pi’ft [' R(p') = 1Z(q ) | TZ(p) = Tl(q)] 

P 'eP\u 

<4 r + e ^' (1+o(1)) 

< e -^-( 1 +°( 1 )), ( 12 ) 


where the first step follows from the definition of U and that we have all the dense clusters removed, 
the second step follows from the linearity of expectation, the third step follows from Theorem 3.1 
(namely, (4)) and the fact that all the points from P\U are within (y/2 ± Q(e))R from both p and 
q (see Figure 4), and the fourth step uses the values for d and r. Overall, combining ( 11 ) and ( 12 ), 
we get 


ln(l/pi) 
ln(l /pi) 


< 


1 


2 c 2 - 1 


+ o c (l). 


□ 


Remark: If instead of (4) we used (5), then the best we could hope for would be 


MVpi) 

ln(l /pi) 


< 


1 

2 c 2 -2 


+ Oc(l), 


which is much worse than ( 8 ), if c is close to 1 . 


Claim 5.6. Suppose that p £ Pq and q e with \\p — q\\ < 1. 
the data structure is at least 


_ i 

n 2c2 -! 


°c(l) 


Then, the probability of success of 


Proof. Let us prove by induction that any query of a data structure built by Process, ProcessBall 
and ProcessSphere with p £ P has probability of success at least \P\~ P ■ n~ a , where p < 
l/(2c 2 — 1) + o c (l) and a = o c (l). First, we prove this for Process assuming that the same bound 
holds for ProcessBall. Then, we argue that for ProcessBall and ProcessSphere essentially 
the same argument works as well. 

Let us denote f(m) a lower bound on the probability of success for Process when \P\ < m and 
denote p\ and p 2 the quantities from Claim 5.5 introduced for Process. 

Let us lower bound f(m) by induction. If p belongs to one of the dense clusters, then f(m) > 
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m p ■ n “by the assumption for ProcessBall. If p does not belong to any dense cluster, we have 


/M > E7? [f(\PnK(q)\) ■ ln(p)=n( q )\ 

= Fv n [K{p) = K(q)] -E n [f(\Pn K(q )|) | K(p) = K(q)} 

>Pi- mf E [/(*)] 

supp X C [m\: 

E[X]<p 2 -m 

> Pl • inf Pr [X < m\ • E [f{X) \ X < m] 

supp X C [m]: 

Fj[X]<p2-m 

>Pi(l-P 2 )- inf E[/(Y)] 

supp Y C[m—1J: 

E[Y] <p2-m 

> p\{l — p 2 )n~ a ■ inf E [Y _p ] 

supp Y C.[m— 1]: 

E[Y]<p 2 -m 

> p\{l — P 2 )n~ a ■ inf E \Y}~ P 

supp Y C[m— 1]: 

E[Y] <p2-m 

> pi(l - P2)p2 P 1Tl~ P n~ a , 


where the third step is due to the definition of p\ and P 2 , the fifth step is due to Markov’s inequality, 
the sixth step is by the induction hypothesis and the seventh step is due to Jensen’s inequality. For 
the induction to go through we must have 


Pi(l -P-2)P 2 P > f, (13) 

so by Claim 5.5 and the fact that p 2 < 1 — 11(1) (which follows from Theorem 2.5), we can set 
P — 2J-1 + °c(l)- 

For ProcessBall and ProcessSphere we can perform a similar induction with two modifi¬ 
cations. First, we need to verify the bound for the base cases in ProcessSphere (in particular, 
this will determine the value of a). Second, since ProcessBall and ProcessSphere depend on 
each other, we need to carry on the “outer” induction on the number of calls to ProcessBall in 
the recursion stack. The latter issue is easy to address, since by Claim 5.2 the maximum number 
of calls to ProcessBall in the recursion stack is bounded, and from the above estimate it is 
immediate to see that there is no “error” that would accumulate. Thus, it is only left to look at 
the base cases of ProcessSphere. The first base case is trivial. For the remaining cases we use 
Theorem 2.3. For the second base case we use the Theorem 2.5: we are in position to apply it, since 
by Claim 5.2 r 2 < c + o c ( 1). As for the third case, since by Claim 5.2 and the fact that for the third 
case r 2 /r\ = O c ( 1) one has that r\ is not too small compared to R, we are in position to apply 
Theorem 3.1. Since by Claim 5.2 r 2 /r\ > c — o c { 1) and by the assumption r 2 > V2R, applying 
Theorem 3.1, we get the required bound on the probability of success. □ 


21 



5.5 Space and query time 

In this section we show that the expected space the data structure occupies is n 1+ ° c d) and the 
expected query time is n° c d). 

Claim 5.7. The overall expected space the data structure occupies is n, 1+ ° c ( 1 ). 


Proof. Every particular point p G Po can participate in several branches of recursion during the 
preprocessing: the reason for this is ProcessBall, where every data point participates in many 
sphere data structures. But observe that by Claim 5.2 every point can end up in at most 


/Is. o c (e-°W) ... 

O c (-) =ra° c(1 > (14) 

branches, since there are at most O c (£~°^) calls to ProcessBall in every branch, and each such 
call introduces branching factor of at most 0((R + r\ + 28)/8) = O c (l/8). 

Next, for every point p G Po and for every branch it participates in, one can see that by Claim 5.2 
the total expected number of calls in the stack is at most 



since the number of ProcessBall’s is at most O c (e~ 0( ' 1 )) they separate the runs of Process and 
ProcessSphere, each of which is of length O(logn) in expectation. 

Since every partition 1Z sampled in Process or ProcessSphere takes n 0c ^ space by Theo¬ 
rem 2.5 and Theorem 3.1, we get, combining (14) and (15), that the space consumption of partitions 
and hash tables per point is n°-\ Also, the base cases in ProcessSphere are cheap too: indeed, 
from Theorem 2.3 (coupled with Theorem 3.1 and Theorem 2.5) we see that the space consumption 
for the base cases is n° c ^ per point per branch. 

Thus, the overall bound n 1+ ° c w follows. □ 

Claim 5.8. For every query q G W d , the expected query time is at most n° c ^ l \ 

Proof. Consider a recursion tree for a particular query q G M rf . 

First, observe that each sequence of calls to Process or ProcessSphere can spawn at most 
0(r^ x • logn) calls to ProcessBall, since every such call multiplies the number of remaining 
points by a factor of at most (1 — r). At the same time by Claim 5.2, each such sequence is of size 
0 (log n) in expectation. 

Since by Claim 5.2 in the recursion stack there can be at most O c (e~° d)) calls to ProcessBall, 
which induces the branching factor of at most O c (l/8), overall, the expected number of nodes in 
the tree is at most 

/lognxOcP"° (1) ) =n o c( i) 

V St / 

In every node we need to do one point location in a partition, which by Theorem 2.5 and Theorem 3.1 
can be done in time n° c ^\ and then for the base cases we have the expected query time n° c ^ (by 
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Theorem 2.3). Thus, the overall expected query time is n ° c ^ 


□ 


6 Fast preprocessing 

A priori, it is not clear how to implement the preprocessing in polynomial time, let alone near-linear. 
We will first show how to get preprocessing time to n 2+ ° c( " 1 ) and then reduce it to n l+ ° c ^ 1 \ 

6.1 Near-quadratic time 

To get preprocessing time n 2+ ° c ^ we need to observe that during the clustering step in Process 
and ProcessSphere we may look only for balls with centers being points from P. 

For Process it is easy to see that we can find balls of radius 4c 2 with centers being points from 
P. Then, the proofs of Claim 5.5 and, as a result, Claim 5.6 go through (we use that, as a result of 
such a preprocessing, there are no dense clusters of radius 2c 2 with arbitrary centers). Also, we 
need to make sure that Claim 5.2 is still true, despite we start with clusters of radius 4c 2 instead of 
2c 2 , but that is straightforward. 

For ProcessSphere the situation is slightly more delicate, since we can not afford to lose a 
factor of two in the radius here. We build upon the following Lemma. 

Proposition 6.1 (van der Corput Lemma). For any v*,vi,V 2 , ..., v n £ S^ -1 one has 

Y,( y i> v j) > | £<->*} | • 

i,j i 

Proof. We have 

|£<aW = («*,££} < IKII 2 ■ ||£w [] 2 = ||££|| 2 = Y^( v i, v j)i 

i i i i i,j 

where the second step is an application of the Cauchy-Schwartz inequality. □ 

The following Claim is the main estimate we use to analyze the variant of ProcessSphere, 
where we are looking only for clusters with centers in data points. Informally, we prove that if a 
non-trivially small spherical cap covers n points, then there is a non-trivially small cap centered in 
one of the points that covers a substantial fraction of points. 

Claim 6.2. Fix e > 0. Suppose that U C S d ~ l with \U\ = n. Suppose that there exists u* £ S^ -1 
such that |ju* — it|| < \/2 — e for every u £ U. Then, there exists uq £ U such that 

jii £ U: \\u — uo|| < V2 — il(e 2 )| > fl(£ 2 n). 

Proof. First, observe that |ju* — u|| < y/2 — e iff (u*,u) > 12(e). By van der Corput Lemma 
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(Proposition 6.1) 


2 


( u ' v ) ^ 12 ( u *, u ) 

u,vEU ilEU 


> fi(e 2 n 2 ). 


Thus, there exists uq £ U such that 


This implies 


which is equivalent to 


J2{u 0 ,u) > n(e 2 n). 

u&U 


jit G U | (uq,u) > H(e 2 )| > H(e 2 n) 


{u€U \ || ri — || < V2 — n(e 2 )| > n(e 2 n) 


□ 


It means that if in ProcessSphere we search for clusters of radius \/2 — n(e 2 ) centered in data 
points that cover at least 0(e 2 rm) points, then after we remove all of them, we are sure that there 
are no clusters of radius y/2 — e with arbitrary centers that cover at least rm points, so Claims 5.5 
and 5.6 go through. The proof of Claim 5.2 needs to be adjusted accordingly, but we claim that by 
setting C in the definition of 5 large enough, the proof of Claim 5.2 still goes through. 

It is easy to see that by reducing the time of each clustering step to near-quadratic, we reduce 
the total preprocessing time to ra 2+ ° c ( 1 ). This follows from the proof of Claim 5.7 (intuitively, each 
point participates in n° c ^ instances of the clustering subroutine) and from the fact that it takes 
time n° c ^ to sample a partition 1Z in Process and ProcessSphere. 


6.2 Near-linear time 

To get n 1+ ° c (P preprocessing time, we just sample the dataset and compute dense balls in the 
sample. Indeed, since we care about the clusters with at least e 2 rn = n 1-0 ^ 1 ) data points, we can 
sample n ° c ^ points from the dataset and find dense clusters for the sample. Then, using the fact 
that the VC-dimension for balls in M. d is 0(d), we can argue that this sample is accurate enough 
with probability at least 1 — n -10 . Then, taking the union bound over all clustering steps, we are 
done. 


7 Acknowledgments 

We thank Piotr Indyk and Sepideh Mahabadi for insightful discussions about the problem and 
for reading early drafts of this write-up. In particular, discussions with them led us to the fast 
preprocessing algorithm. 


24 



References 


[AAKK14] 

[AI06] 

[AI08] 

[AIK+15] 

[AINR14] 

[And09] 

[AR15] 

[Cha02] 

[Cla88] 

[Coul3] 

[DF08] 

[DG03] 

[DIIM04] 

[Dub 10] 
[FKS84] 
[HPIM12] 
[IM98] 

[JL84] 


Amirali Abdullah, Alexandr Andoni, Ravindran Kannan, and Robert Krauthgamer. Spectral 
approaches to nearest neighbor search. In Proceedings of IEEE Symposium on Foundations of 
Computer Science (FOCS ’2014), 2014. 

Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest 
neighbor in high dimensions. In Proceedings of the 4 7th Annual IEEE Symposium on Foundations 
of Computer Science (FOCS ’2006), pages 459-468, 2006. 

Alexandr Andoni and Piotr Indyk. Near-optimal hashing algorithms for approximate nearest 
neighbor in high dimensions. Communications of the ACM , 51(1):117-122, 2008. 

Alexandr Andoni, Piotr Indyk, Michael Kapralov, Thijs Laarhoven, Ilya Razenshteyn, and Ludwig 
Schmidt. Practical and optimal LSH for angular distance. Manuscript, 2015. 

Alexandr Andoni, Piotr Indyk, Huy L. Nguyen, and Ilya Razenshteyn. Beyond locality-sensitive 
hashing. In Proceedings of the 25th Annual ACM-SIAM Symposium on Discrete Algorithms 
(SODA ’2014), pages 1018-1028, 2014. 

Alexandr Andoni. Nearest Neighbor Search: the Old, the New, and the Impossible. PhD thesis, 
Massachusetts Institute of Technology, 2009. 

Alexandr Andoni and Ilya Razenshteyn. Tight lower bounds for data-dependent locality-sensitive 
hashing. Available at http://arxiv.org/abs/1507.04299, 2015. 

Moses Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of 
the 34th ACM Symposium on the Theory of Computing (STOC ’2002), pages 380-388, 2002. 

Kenneth L. Clarkson. A randomized algorithm for closest-point queries. SIAM Journal on 
Computing, 17(4):830-847, 1988. 

National Research Council. Frontiers in Massive Data Analysis. The National Academies Press, 
Washington, DC, 2013. 

Sanjoy Dasgupta and Yoav Freund. Random projection trees and low dimensional manifolds. In 
Proceedings of the 40th annual ACM symposium on Theory of computing, pages 537-546. ACM, 
2008. 

Sanjoy Dasgupta and Anupam Gupta. An elementary proof of a theorem of Johnson and 
Lindenstrauss. Random Structures and Algorithms, 22(l):60-65, 2003. 

Mayur Datar, Nicole Immorlica, Piotr Indyk, and Vahab S. Mirrokni. Locality-sensitive hash¬ 
ing scheme based on p-stable distributions. In Proceedings of the 20th ACM Symposium on 
Computational Geometry (SoCG ’2004), pages 253-262, 2004. 

Moshe Dubiner. Bucketing coding and information theory for the statistical highdimensional 
nearest-neighbor problem. IEEE Transactions on Information Theory, 56(8):4166-4179, 2010. 

Michael L. Fredman, Janos Komlos, and Endre Szemeredi. Storing a sparse table with 0(1) worst 
case access time. Journal of the ACM , 31(3):538-544, 1984. 

Sariel Har-Peled, Piotr Indyk, and Rajeev Motwani. Approximate nearest neighbor: towards 
removing the curse of dimensionality. Theory of Computing , 8(l):321-350, 2012. 

Piotr Indyk and Rajeev Motwani. Approximate nearest neighbors: towards removing the curse 
of dimensionality. In Proceedings of the 30th ACM Symposium on the Theory of Computing 
(STOC ’1998), pages 604-613, 1998. 

William B. Johnson and Joram Lindenstrauss. Extensions of Lipschitz mappings into a Hilbert 
space. In Conference in modem analysis and probability (New Haven, Connecticut, 1982), 
volume 26 of Contemporary Mathematics, pages 189-206. 1984. 


25 



[KMS98] 

[LLR95] 

[McNOl] 

[Mei93] 

[MNP07] 

[OvL81] 

[OWZ11] 

[PTW08] 

[PTW10] 

[Razl4] 

[SH09] 

[Spr91] 

[Vail 2] 

[VKD09] 

[WSSJ14] 

[WTF08] 

[YSRL11] 


David R. Karger, Rajeev Motwani, and Madhu Sudan. Approximate graph coloring by semidefinite 
programming. Journal of the ACM , 45(2):246-265, 1998. 

Nathan Linial, Eran London, and Yuri Rabinovich. The geometry of graphs and some of its 
algorithmic applications. Combinatorica, 15(2):215-245, 1995. 

James McNames. A fast nearest-neighbor algorithm based on a principal axis search tree. IEEE 
Transactions on Pattern Analysis and Machine Intelligence, 23(9):964-976, 2001. 

Stefan Meiser. Point location in arrangements of hyperplanes. Information and Computation, 
106(2):286-303, 1993. 

Rajeev Motwani, Assaf Naor, and Rina Panigrahy. Lower bounds on locality sensitive hashing. 
SIAM Journal on Discrete Mathematics, 21(4):930-935, 2007. 

Mark H. Overmars and Jan van Leeuwen. Worst-case optimal insertion and deletion methods for 
decomposable searching problems. Information Processing Letters, 12(4):168 - 173, 1981. 

Ryan O’Donnell, Yi Wu, and Yuan Zhou. Optimal lower bounds for locality sensitive hashing 
(except when q is tiny). In Proceedings of Innovations in Computer Science (ICS ’2011), pages 
275-283, 2011. 

Rina Panigrahy, Kunal Talwar, and Udi Wieder. A geometric approach to lower bounds for 
approximate near-neighbor search and partial match. In Proceedings of the )9th Annual IEEE 
Symposium on Foundations of Computer Science (FOCS ’2008), pages 414-423, 2008. 

Rina Panigrahy, Kunal Talwar, and Udi Wieder. Lower bounds on near neighbor search via metric 
expansion. In Proceedings of the 51st Annual IEEE Symposium on Foundations of Computer 
Science (FOCS ’2010), pages 805-814, 2010. 

Ilya Razenshteyn. Beyond Locality-Sensitive Hashing. Master’s thesis, MIT, 2014. 

Ruslan Salakhutdinov and Geoffrey E. Hinton. Semantic hashing. International Journal of 
Approximate Reasoning, 50(7):969-978, 2009. 

Robert F. Sproull. Refinements to nearest-neighbor searching in fc-dimensional trees. Algorithmica, 
6(1-6) :579-589, 1991. 

Gregory Valiant. Finding correlations in subquadratic time, with applications to learning parities 
and juntas. In Proceedings of the 53rd Annual IEEE Symposium on Foundations of Computer 
Science (FOCS ’2012), pages 11-20, 2012. 

Nakul Verma, Samory Ivpotufe, and Sanjoy Dasgupta. Which spatial partition trees are adaptive 
to intrinsic dimension? In Proceedings of the 25th Conference on Uncertainty in Artificial 
Intelligence (UAI ’2009), pages 565-574, 2009. 

Jingdong Wang, Heng Tao Shen, Jingkuan Song, and Jianqiu Ji. Hashing for similarity search: A 
survey. CoRR, abs/1408.2927, 2014. 

Yair Weiss, Antonio Torralba, and Rob Fergus. Spectral hashing. In Proceedings of 22nd Annual 
Conference on Neural Information Processing Systems (NIPS ’2008), pages 1753-1760, 2008. 

Jay Yagnik, Dennis Strelow, David A. Ross, and Ruei-Sung Lin. The power of comparative rea¬ 
soning. In Proceedings of 13th IEEE International Conference on Computer Vision (ICCV ’2011), 
pages 2431-2438, 2011. 


A Analysis of Spherical LSH 

In this section we prove Theorem 3.1. We will use the following basic estimate repeatedly. 
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(a) The case a < 7t/ 2 (b) The case a > n/2 

Figure 5: The region corresponding to X > d 1 / 4 and X cos a — Y sin a > d 4 / 4 


Lemma A.l (e.g., [KMS98]). For every t > 0 

1 


y/2n 


(j - ^3) ' e t /2 - Ft X~N(0,1)[X 


,— - • e 
y/2 7T t 


— 1 2 /2 


A.l Collision probability for a pair 


Suppose that u,v G S 1 ^ 1 are two points on the unit sphere with angle a between them. Our goal is 
to estimate the probability of collision Pr^ [TZ(u) = TZ(v)\, where 1Z is a partition sampled according 
to Spherical LSH. To compute the probability of collision, observe that the probability that at a given 
iteration we “capture” either u or v is equal to Pr g ^jv(o,i) d ( u i9) — d 1 ^ or (v,g) > d 1 / 4 . Similarly, 
the probability that we capture both u and v is equal to Pr^ jv(o,i) d (“iff) > d 1/4 and (v,g) > d 1/4 \. 
After a moment of thought, it becomes clear that the probability of collision Pr^ \TZ(u) = TZ(v)\ 
equals to the probability of the event “both u and v are captured at a given iteration” conditioned 
on the event “either u or v are captured at a given iteration”. Thus, 


Pr^ [K(u) = n[v)\ = 


Pr 


grsjN(0,l) d 


(u,g) > d 1//4 and (v,g) > d 1 / 4 


Pr ff ~Ar(o,i) d [( u >9) > d 4 / 4 or (v,g) > d 1 / 4 ] 


Pr 




X > d 1 / 4 and X cos a — Y sin a > d 1 / 4 


P r x,y~jv(o,i) [X > d 1 / 4 or X cos a — Y sin a > d 1 / 4 ] 


G [0.5; 1] 


Pr 


X,Y~N(0,1) 


X > d 1 / 4 and X cos a — Y sin a > d 1 / 4 


Pr .Y~iV(0,l) [X > d 1 / 4 ] 


(16) 
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where the second step follows from the spherical symmetry of Gaussians, and the third step follows 
from the immediate equality 


Prx~jv(o,i) x > d 1 / 4 = Prx,y~AT(o,i) x cos a - Y sin a > d P 4 . 

By Lemma A.l, 

r , p-Vd/2 

Prx~x(o,i) \_ x > d 1/4 ] = (1 + 0(d 1/2 )) • W ( 17 ) 

so, plugging (17) into (16), we have 

, Pr x v~Arrn ii X > ci 1 / 4 and X cos a — Y sin a > d P 4 

Pr K [K(«) = K(»)l=0(<i 1/4 )'- ; ^- : p 7m - I ( 18 ) 

For 0 < a < n and A > 0 denote 

W a ,a := {(x, y ): x > A and .xcos a — y sin a > A} C M 2 . 

Thus, estimating Pr^ [lZ(u) = 'R,{v)] boils down to computing the Gaussian measure of W a d i/4 (see 
Figure 5). 

For d £ N, 0 < ct < 7r denote 

S(d,a ) := Prx,y~AT(o,i) { X i X ) £ tFa.d 1 / 4 • 

Claim A.2. For every d the function S(d,a) is non-increasing in a. 

Proof. One can check that for every d £ N and 0 < of < < tt we have 

^Pa'.d 1 / 4 2 JPaV 1 / 4 ) 

hence the claim. □ 

Our next goal will be estimating S(d, a) for a £ [d _ ^P; 7r—d _f2 (P] within a factor polynomial in d. 
We would like to claim that S(d, a) is close to S(d, a) := Prx,y~jv(o,i) X — d 1 ^ 4 and Y < yo , where 
yo = —d 1 / 4 tan f is the y-coordinate of the intersection of two lines: x = d 1 / 4 and x cos a — y sin a = 
d 1 / 4 (see Figure 5). But first let us compute S(d, a )—this is easy due to the independence of X 
and Y. 

Claim A.3. If a = ff(d -1 / 5 ), then 

. 1 ± d~ n<4 ) / / 2 \/d 

S(<i ' Q) 6 2^1/2 tan f eXP (“( 1 + *“ 2 J T 
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Proof. First, observe that since a = 0(ri 1 / 5 ), we have yo = —d 4 / 4 tan f = — d^ 1 ^. Next, we have 


S(d,a ) = Prx,y~AT(o,i) ^ > d 1/4 and 1" < y 0 


= Pr 


X~JV(0,1) 


X > d 1/4 


G (1 ±d n(1) ) • 
1 ± d~ n W 

G 2vrd 1 / 2 tan f 


e -Vd/2 e 


P r y~iv(o,i) IX — do] 

IJ /01 2 /2 


v/i^rd 1 / 4 v/27r|yo| 


exp — 1 + tan 


a 


Xd N 

2 , 


where the second step is by independence of X and 1", the third step is due to yo = —d^P and 
Lemma A.l and the fourth step is due to yo = —d 1 / 4 tan f. □ 

Now the goal is to prove that if a is not too close to 0 or 7r, then S(d, a) is close to S(d, a). 

Claim A.4. If a = and a = n — Q(d~ s ) for a sufficiently small 5 > 0, then 

d ~o(i) < s y^ a ) < d o{i) 

S(d, a) 

Proof. First, for a = 7r/2 the claim is trivial, since S(d, 7r/2) = S(d, 7r/2). Next, we consider the 
cases a < 7 t/ 2 and a > it/2 separately (see Figure 5). 


The case a < 7t/ 2: in this case we have 


S(d, a) — S(d, a) = 


roc poo 


dx dy, 


2n Jy 0 Jx 0 (y) 

where xq (y) = - / a (see Figure 5). For every y > yo we have xq (y) > d 1 / 4 , so by Lemma A.l 



(1 ± d~ n ^)e~ X0 ^ 2 / 2 
xo (y) 


Thus, 


y 2 +xp(y) 2 


S(d, a) — S(d, a) = 


1 ±d~ n W f oo 

xo (y) 


By direct computation, 


y +x 0 (y) = y + 


V/ 4 + 


2t r 


y sm a 


dy. 


’yo 


cos a 


d 1/4 sina + y \ 2 rfl/2 
cos a I 


Let us perform the following change of variables: 

d 1 / 4 sin a + y 


u = 


cos a 
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We have 


S(d, a) — S(d, a) = 


(1 ± d~ n< - 1 ' ) )e~^ /2 


Id V4 tan ^ d 1 / 4 + u tarn 


(1 ±d- n W)e~^ 2 


27rd 1 / 4 


e ~ u2 / 2 du 


Id V4 tan f 


= (l±d- Q(1 ))S(d,a), 

where the last step is due to Lemma A.l and Claim A.3. Overall, we have 

S(d, a) < S(d, a) < (2 ± d“ n(1) ),S(d, a). 


The case a > 7t/2: this case is similar. We have 


S(d,a) — S(d,a) = — / / e 2 dx dy 

27T J —00 Jxo(y) 

. r->/i \ W 2 +so(w ) 2 

1 ± d _n(1) fyo e - 2 - 


After the change 


d 1 / 4 sin a + y 


we get (note that cos a < 0) 


S(d, a) — S(d, a) = 


(1 ± d- n ( 1 ))e _v/ ^/ 2 Z 100 


Id 1 / 4 tan 2 d tan aI - d 1 / 4 


< (l±rf- n(1| )e-^ /2 f°° e -„ V 2 iu 

27t d 1 / 4 (| tan a| tan § — 1) Jd 1 / 4 tan f 


1 ± d“ n W 

tan a| tan § — 1 


5(d, a). 


Since a = n — Q(d s ), we have 


tan a| tan — 1 1 + Q(d <5 ) 


- x - = i-n(d~ 5 ). 


We can choose 6 such that 


S(d, a) - S(d, a) < (1 - d“° (1) )5(d, a), 


d“° (1) • 5(d, a) < S(d, a) < S(d , a). 
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Combining (18), Claim A.2, Claim A.3, Claim A.4 and the formula 


9 (y, 

tan — = 


\u — v\ 


2 4 — ||tt — r|| 2 ’ 

we get the first two bullet points from the statement of Theorem 3.1. 


A.2 Three-way collision probabilities 

In this section we prove (4). We start with a simpler case e = 0. 


A.2.1 The case e = 0 

Suppose we have u,v,w € *S' rf—1 with ||rt — v\\ = t with r < 1.99, ||w — w ||, ||u — u>|| = \/2. Our goal 
is to upper bound 

Pin [R{u ) = K(w) I K(u) = U{v )\. 

Similarly to the two-point case, we have 


Pr w [K{u) = K{v) = K{w)} 


Pr 


g~N(0,l) d 


(u,g) > d 1 / 4 and (v,g) > d 4 / 4 and {w,g) > d 1 / 4 


Pr 5 ~iV(o,i)<* [<«>0> > dl/i or (v,g) > d 1/4 or (w,g) > d 1 / 4 ] 


Pr 


X,Y,Z~N( 0,1) 


A > d 1 / 4 and y x /l - £ + Z • 5 > d 1 / 4 and T x /l - 4 ~ Z • ? > d 1 / 4 


2 — 


0 

4 


2 — 


Pr 


x,y,z~Jv(o,i) 


A > d 1 / 4 or y A /l - £ + A • | > d 1 / 4 or TCI - £ - A • § > d 1 / 4 


Pr 


.Y,Y,Z~jV(0,l) 


= 0 ( 1 ) 

= 0(1) • Pry ; 2~jV(0,l) 
= 0(1) • Prx~AT( 0 ,l) 


A > d 1 / 4 and YJ 1 - £ + A • | > d 1 / 4 and Pdl - £ - Z • § > d 1 / 4 


PiA~7V(0,i) [A > d 4 / 4 ] 


Y Jl- T l + Z . T -> d 1 / 4 and yJi- T ^-Z- T -> d 1 / 4 


A > d 1/4 


• Pr-fc [ft(u) = A(r)] 


where the first step is similar to the two-point case, the second step is by the spherical symmetry of 
Gaussians, the third step is due to the following immediate identity 


Pr 


X~iV(0,l) 


A > d 1/4 


= Pr 


y,z~jv(o,i) 




= Pr 


Y,Z~N{ 0,1) 
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the fourth step is due to the independence of X, Y and Z, and the last step is due to (16). Thus, 


Pr^ [K(u) = K{w) | K(u) = K(v)\ 


@(1) ' P r X~Af(0,l) 


X > d l/A 


which, combined with Lemma A.l, gives the desired claim. 


A.2.2 The case of arbitrary e 

Suppose that u,v,vj £ S d ~ 1 are such that |u — v\\ = t with r < 1.99, ||w — w ||, ||u — u>|| £ y/2 ± e 
with e = o(l). We would like to show (4) for this case. In a nutshell, the goal is to show that the 
bound proved in Section A.2.1 is stable under small perturbations in e. 

We are interested in lower bounding 

Pr-R. [lZ(u) = 1Z(w) | lZ(u) = 7l(v)\ Pr-^ [7 Z(u) = 7 Z(v) = 7l(w)] Pr^ [7 Z(u) = 7l(v)] 

(19) 

First, if r < e v , where is > 0 is a small positive constant (to be chosen later), then we can 
proceed as in (5): we use that 


In 


> In 


Prrc [7 Z(u) = 7 Z(v) = K(w)} ~ Pv n \K{u) = K[w)\ 


> (1-e n(1) -cT n(1) ) • —, (20) 


due to ||u — rc|| > y/2 — e and (2). On the other hand 

1 


In 


Pr-ft [7 Z(u) = K(v)} 


< (e 0 ^ 1 ) + <T n(1) ) • \/d, 


( 21 ) 


due to ||u — u|| < and (3). Thus, combining (19), (20) and (21), we are done. 

Thus, we can assume that r > e" for a small is > 0. Due to the spherical symmetry of Gaussians, 
we can assume that u,v,w £ S 2 . Let u',v',w' £ S 2 be such that || u! — u/|| = ||u' — u/|| = \f2 and 
||u / — v'\\ = r. From Section A.2.1 we know that 


In 


1 


> (l-(T n(1) )- 


\fd 

~2~’ 


besides that, 

111 Prw [K(u) = K(v)} = ln Pr n [n[u') = 7 ^( ^ /)] ’ 
since \\u — u|| = ||rt / — ■t' / || = r; thus, it is sufficient to show that 


ln 


> ln 


Pr n [K{u) = K{v) = K(w)] ~ Pr n [7^(^x / ) = K(v') = lZ{w')} 


(e n " (1) + <T n(1) ) ■ Vd, (22) 
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provided that v > 0 is small enough. Recall that 


Pr^ [n(u) = K(v) = K(w )] 


[K(u') = K(v') = K(w') 


Pl 'g~7V(0,l) 3 

(u, g) > d 1 / 4 A (v, g) > d 1 / 4 A (w, g) > d 1 / 4 


Pr g~iV(0,l) 3 

(u, 9) > d 1 ! 4 V (v, g) > d 1 / 4 V (w, g) > d 1 / 4 


Pr 3~Af(0,l) 3 

(v!,g } > d 1 / 4 A (v',g) > d 1 / 4 A ( w',g} > d 1 / 4 

Pr g~iV(0,l) 3 

( v!,g ) > d 1 / 4 V (u',g) > d 1 / 4 V ( w',g) > d 1 / 4 ] 


(23) 

(24) 


Observe that the denominators in (23) and (24) are within a factor 3 from each other. Thus, it is 
sufficient to prove that 


In 


1 


Pr 9 ~iV(o,i)3 [(u,g) > d V 4 A (v,g) > d V 4 A (w,g) > d 1 / 4 ] 

1 


> l n _r_ ( £ ^W + . J~d 

Pr 9 ~iV(o,i) 3 [«0> > d 1 / 4 A (v',g) > d 1 / 4 A (w',g) > d 1 / 4 ] 


provided that v > 0 is small enough. 

The joint distribution of (u',g), (v',g) and (w 1 , g) is 
and covariance matrix 


V = 


( 1 
1 - ; 
V 0 


1 - — 
2 


a multivariate Gaussian with zero mean 


° ^ 
0 

1 


Similarly, for ( u,g ), (v,g) and ( w,g) the mean is zero and the covariance matrix is 


C 


( 1 1 - 4 ±0(e) \ 

1 - £ 1 ±0(e) 

V ±o( £ ) ±o( £ ) 1 / 


Observe that both C and V have all eigenvalues being at least e °^ and at most 0(1) by 
Gershgorin’s theorem and due to the bound r > e v . In particular, both C and T> are invertible. 

Definition A.5. For a closed subset U C we denote g(U) the (Euclidean) distance from 0 to U. 

In order to prove (22) we need to show that the probability that the centered Gaussian with 
covariance matrix C belongs to the set T = j (x,y,z) G R 3 : x > d}/ 4 ,y > d l / 4 ,z > d p4 j is not 
much larger than the same probability for the centered Gaussian with covariance matrix C'. Using 
the results of Section B, we get 


Pr 


(u, g) > d 1/4 A (v, g) > d 1/4 A (w, g) > d 1/4 


c/~jV(0,l) 3 

- M (C-V 2 T) 2 /2 n . £ a,(i)j e -(i- pM1) VU>- 1/2 D 2 

= p W(o,C) [a: e T] < 0(1) • - .. ( n — 1 i9.rr\ — Q(i)- — 1 /9m\ -. (25) 


g(C~ 4 / 2 T) 


Hip-V 2 T) 
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Now observe that by the results of Section A.2.1 we have 

Pi 9 ~;v(o,i ) 3 [<«', 9) > d l/A A (v', g) > d 1/4 A (w', g) > d 1/4 ] = 2 . ( 26 ) 

Combining (25) and (26), we are done. 

A.3 Efficiency 

As has been already observed, the partitioning scheme as described in Section 3 is not efficient. One 
can fix this issue as follows. Instead of checking, whether {J1Z / S^ 1 , we can just stop after T 
iterations. If we choose T such that the probability of the event “(J 1Z / S"^ 1 ” after T steps is less 
than e~ dl °° , then we are done, since the bounds stated in Theorem 3.1 would remain true. 

For a fixed point u £ S^ 1 we have 

Pl 9 ~iV(o,i) d ( u,g)>d 1/A = Prx~/v(o,i) X > d 1/4 > e^ 0 ^, 

where the first step is by 2-stability of Gaussians and the second step is by Lemma A.l. 

Now it is not hard to see taking the union bound over a sufficiently fine e-net that if we set 
T = e 0 (v^) ; then we get what we want. This way, we conclude the time and space bounds in 
Theorem 3.1. 


B Perturbed Gaussian measures 

Let A £ W. dxd be a symmetric positive-definite matrix. Consider a centered d -variate Gaussian 
1V(0, A), whose covariance matrix equals to A, and let S C be a closed convex set. 


P hr~iV( 0 ,A) [» £ S’] — p r y ~iV(o,/) A p2 y £ S — p r y ^v(o,/) V £ A 1 ^S 


< p A~iV(o,i) z>d{0,A 1/2 S) 


\pht h(A x / 2 5 ) 


< -_ - . e -h(A-V 2 S) 2 /2 (2?) 


where the first step is due to properties of multivariate Gaussians, the third step is due to the 
spherical symmetry of 1V(0, 1) and due to the convexity of S (and hence A _1 / 2 5); the last step is 
due to (A.l). 

Now suppose that A A A/, where A > 0, and B £ M rfx<z is such that ||B — A\\ < e for some e > 0 
with e <C A. Clearly, similarly to (27), we get 


Pr a;~Ar(0,S) [ x e &} < 


V2tt /j,(B~ 1 / 2 S) 


1 _ fJ -M(B- 1 /2 S )2 /2 


Definition B.l. Two matrices A and B are called s-spectrally close, if 


• AAB <e £ -A. 
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We would like to claim that the right-hand sides of (27) and (28) are quite close. For this it is 
sufficient to compare ^(A~ 1 ^ 2 S) and li(B~ 1 / 2 S). Since 


WA^B-IW < p- 1 |||| J B-7L|| < 

We get that A and B are 0(£/A)-spectrally close. Thus, A _1 and B _1 are 0(e/A)-spectrally close. 
Finally, A -1 / 2 and i? -1 / 2 are 0(e/A)-spectrally close as well. Thus, 


and 


^B- l ' 2 S) € (l ± O • fiiA-^S), 


P r a:~iV(0,B) [ x e $} < 



1 ± Q(e/ A) (i±o(e/A))-M(A- 1 / 2 s) 2 /2 

H(A-V 2 S) 


35 



